perm filename A68.TEX[254,RWF] blob sn#856529 filedate 1988-04-27 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnification\magstephalf
C00024 ENDMK
C⊗;
\magnification\magstephalf
\input macrwf.tex
\def\today{\ifcase\month\or
  January\or February\or March\or April\or May\or June\or
  July\or August\or September\or October\or November\or December\fi
  \space\number\day, \number\year}
\def\Lsim{\buildrel L \over\sim}
\def\Lapprox{\buildrel L \over\approx}
\def\Psim{\buildrel P \over\sim}
\def\Papprox{\buildrel P \over\approx}
\baselineskip 14pt
\def\fnc#1{\mathop{\rm #1}\nolimits}
\rm
\line{\sevenrm a68154.tex \today\hfill}

\bigskip
\noindent{\bf Equivalence in Languages}

Let $L$ be any language, whether or not recognized by a machine. We define
two equivalence relations associated with~$L$. 

Two strings $x↓1$ and $x↓2$ are {\it prefix equivalent\/} in~$L$, 
$x↓1\Lsim x↓2$, if for every string~$y$, both or neither of $x↓1y$ and
$x↓2y$ belong to~$L$. That is, 
$x↓1\Lsim x↓2$ iff $\forall y(x↓1y\in L\equiv x↓2y\in L)$.
To put it in another way, define $x\backslash L$, the {\it quotient\/}
of~ $x$ into $L$, as the set of strings~$y$ such that $xy\in L$; then
$x↓1\zap x↓2$ iff $x↓1\backslash L=x↓2\backslash L$. The latter definition
makes it clear that $\Lsim$ is an equivalence relation.

Two strings $y↓1$ and $y↓2$ are {\it infix equivalent\/} in~$L$, 
$y↓1\Lapprox y↓2$, if for all strings $x$ and~$z$, both or neither of
$xy↓1z$ and $xy↓2z$ belong to~$L$. That is $y↓1\Lap y↓2$ iff
$\forall x,z(xy↓1z\in L\equiv xy↓2z\in L)$. 

\smallskip
\noindent{\bf Drill:} Show $\Lapprox$ is an equivalence relation.

\proclaim Theorem. $y↓1\Lapprox y↓2$ implies $u↓1\Lsim u↓2$.

Proof:  In the definition of infix equivalence, set $x=\Lambda$.  The
definition then states that $∀z(y↓1z\in L ≡ y↓2z\in L)$, the definition of
prefix equivalence.

\proclaim Theorem.  $u↓1 \Lsim u↓2$ and $v↓1 \Lapprox v↓2$ implies
$u↓1v↓1 \Lsim u↓2v↓2$.

Proof:  For all $w$, $u↓1v↓1w \in L ≡ u↓2v↓1w \in L ≡ u↓2v↓2w \in L$.

\proclaim Theorem.  $u↓1 \Lapprox u↓2$ and $v↓1 \Lapprox v↓2$ implies
$u↓1v↓1 \Lapprox u↓2v↓2$.

Proof:  For all $x$, $z$, $xu↓1v↓1z \in L ≡ xu↓2v↓1z \in L ≡ xu↓2v↓2z \in L$.
\smallskip
{\bf Drill:}  Show if $u↓1 \Lsim u↓2$ then $u↓1v \Lsim u↓2v$.
\smallskip
{\bf Drill:}  Show if for all $u$, $uv↓1 \Lsim uv↓2$, then $v↓1 \Lapprox v↓2$.

Let $[x]↓{\Lsim}$ and $[x]↓{\Lapprox}$ be the set of strings respectively
prefix equivalent to $x$, and infix equivalent to $x$.  We define operations
analogous to concatenation on these equivalence classes:
$$\eqalign{[x]↓{\Lsim} \otimes [y]↓{\Lapprox} &= [x \otimes y]↓{\Lsim},\cr
[x]↓{\Lapprox} \otimes [y]↓{\Lapprox} &= [x \otimes y]↓{\Lapprox}.\cr}$$

That is, in the first case we pick any member $x↓1$ of $[x]↓{\Lsim}$ and any
member $y↓1$ of $[y]↓{\Lapprox}$, concatenate them, and take the prefix
equivalence class $[x↓1y↓1]↓{\Lsim}$.  Because $x \Lsim x↓1$ and $y \Lapprox y↓1$
implies $xy \Lsim x↓1y↓1$, the result is independent of which $x↓1$ and $y↓1$
are selected as representatives of $[x]$ and $[y]$.

From here on, if the superscript of $\sim$ or $\approx$ is $L$, we omit it.
\smallskip
{\bf Drill:}  Show $([x]↓\sim \otimes [y]↓\approx )\otimes [x]↓\approx =
[x]↓\sim \otimes ([y]↓\approx \otimes [z]↓\approx)$
\smallskip
{\bf Drill:}  Show $([x]↓\approx \otimes [y]↓\approx) \otimes [z]↓\approx =
[x]↓\approx \otimes ([y]↓\approx \otimes [z↓\approx])$

Because of the two results above, we need not parenthesize concatenations of
three or more equivalence classes.
\smallskip
{\bf Drill:} $[\Lambda]↓\approx \otimes [x]↓\approx = [x]↓\approx =
[x]↓\approx \otimes [\Lambda]↓\approx$.

A set like the set of infix equivalence classes that has a operation like
concatenation that is associative, and has an identity element such as $[\Lambda]
↓\approx$, is called a {\it monoid}.  Composition of relations on a particular
set is another monoid.  Other examples are concatenation of strings, matrix
multiplication, and composition of functions.

Assume language $L$ is recognized by a deterministic program $P$ for (not
necessarily finite) machine $M$.  Let ${\cal C}↓x$ be the configuration of $M$
(omitting input state) immediately after $P$ reads $x$.  Clearly if ${\cal C}
↓x = {\cal C}↓y$, $x \sim y$.  Starting in any configuration ${\cal C}$ and
reading string $x$, we arrive at a configuration we call ${\cal C} \tau↓x$ 
Clearly if $\tau ↓x$ is the same function as $\tau ↓y$, $x \approx y$.
We define two equivalence relations for program $P$:
$$\displaylines{x↓1 \Psim x↓2\hbox{ if }{\cal C}↓x = {\cal C}↓y\cr
x↓1 \Psim x↓2\hbox{ if } \tau↓{x↓{1}} = \tau↓{x↓{2}}\cr}$$
Then $x↓1 \Psim x↓2$ implies $x↓1 \Lsim x↓2$ and $x↓1 \Papprox x↓2$ implies
$x↓1 \Lapprox x↓2$.

If$P$ is a program for a finite machine, there are only finitely many possible
configurations ${\cal C}↓x$, so $\Psim$ has only finitely many equivalence
classes.  Because $x↓1 \Psim x↓2$ implies $x↓1 \Lsim x↓2$, $\Lsim$ has only
finitely many equivalence classes.  Similarly, there are only finitely many
transition functions $\tau↓x$, so $\Papprox$ and $\Lapprox$ have only finitely
many equivalence classes.

Conversely, let $L$ be any language.  Suppose that $\Lsim$ or $\Lapprox$ has
finitely many equivalence classes.  Because $x↓1 \Lapprox x↓2$ implies
$x↓1 \Lsim x↓2$, we can be sure that $\Lsim$ has finitely many equivalence
classes.  Let these classes be the control states of a finite machine, with
$[\Lambda]↓\sim$ as the start state.  The program is all instructions of 
the form $\langle[x]↓\sim → [xa]↓\sim, \hbox{scan}\,a\rangle$.  The accepting
control states are those $[x]$ for which $x\in L$.  Then the program recognizes
$L$. {\bf Drill:}  Prove it.

No finite recognizer for $L$ can have fewer control states than the number of
prefix equivalence clases for $L$.  If $x↓1 \not\sim x↓2$, ${\cal C}↓{x↓{1}}
\not= {\cal C}↓{x↓{2}}$.  Take one representative $x$ from each prefix equivalence
class.  The corresponding control states ${\cal C}↓x$ are all distinct, so the
number of control states is at least the number of prefix equivalence classes.
Therefore, the program of the previous paragraph that uses the prefix
equivalence classes as control states is the minimum program for $L$ on a
finite machine.

Suppose a machine is given the characters of a string $x$ in some specific
out-of-order fashion, perhaps backwards, or perhaps with the first half of
$x$ interleaved with the second half.  We want to find a program to determine
if $x\in L$.  A typical way to do this is to keep track of the infix
equivalence class of each contiguous piece of $x$ that the program has read.
For example, if $x$ is $a↓1a↓2a↓3a↓4a↓5a↓6a↓7$, let the input $y$ be
$a↓4a↓3a↓5a↓2a↓6a↓1a↓7$, obtained by starting at the center and working outward.
The program has initial state $\langle[\Lambda ], 0,y\rangle$.  The
instructions are $\langle[x]↓\approx → [xa]↓\approx, 0→1, \hbox{scan}\,a\rangle$
and $\langle[x]↓\approx → [ax]↓\approx, 1→0, \hbox{scan}\,a\rangle$ with accepting
configurations $\langle[x]↓\approx, 1, \Lambda \rangle$ for which
$x\in L$.
\smallskip
{\bf Exercise:}  Find programs for the two reorderings in the first
sentence of this paragraph.
\smallskip
{\bf *Exercise:}  Find a program for the strings $y$ such that $y$ is the middle
third of a string $x\in L$.

An example of a language for which the equivalence classes are infinite is
the one-parenthesis language $L↓{\tt()}$.  Two strings are prefix-equivalent if
they have the same number of unmatched left parentheses, like {\tt((((((()))))}
and {\tt(()(}, or if both are illegal prefixes, like {\tt)} and {\tt(()))((}.
Two strings are infix-equivalent if they have the same number of unmatched
left parentheses and the same number of unmatched right parentheses, like
{\tt))(((} and {\tt)()(()))(()((}.
\smallskip
{\bf Exercise:}  Find a program for a one-counter machine to recognize $L↓{\tt()}$.

\goodbreak
\noindent{\bf Equivalence and Minimization of DFRs}

Let $P$ be a standardized DFR with control states $Q$, initial state $q↓0$, and
result function $\omega$.  We define {\it equivalence} among states by
$$q \sim q' ≡ ∀x q \tau ↓x (\omega = q' \tau ↓x \omega)$$.
Then $q \not\sim q' ≡ ∃x (q \tau ↓x \omega \not= q' \tau ↓x \omega)$. We can
compute the equivalence relation by computing the inequivalence relation and
complementing it.  We can compute the inequivalence relation iteratively by
allowing the length of $x$ to be successively at most zero, one, two, etc.

Define $\nu ↓0 = \{\langle q,\ q'\rangle \mid q \omega \not= q' \omega\}$.
Taking $x = \Lambda$ in the definition of $\sim$ shows $\nu ↓0 \subseteq
\not\sim$.

Define $\nu ↓{i+1} = \nu ↓i ∪ \bigcup ↓{a \in \Sigma} \tau ↓a \circ \nu ↓i \circ
\tau ↓a↑{-1}$.

By induction, assume $\nu ↓i \subseteq \not\sim$.

If $q(\tau ↓a \circ \nu ↓i \circ \tau ↓a↑{-1})q'$, we have the diagram below.

\vskip 1.5in
to which we add
\vskip 1.5in
(because $q↓1 \not\sim q↓1'$) to get
\vskip 1.5in
showing that $q \not\sim q'$. Then $\tau↓a \circ \nu ↓i \circ \tau ↓a↑{-1}
\subseteq \not\sim$.

Because $\nu ↓i \subseteq \nu↓{i+1}$, and $\nu ↓i$ is a subset of the finite set
$Q \times Q$, eventually some $\nu↓n = \nu ↓{n+1}$, and by the iterative equation
$\nu ↓{n+1} = \nu ↓{n+2}$, etc.  We call this limiting value $\nu ↓\infty$.  We
have seen that $\nu ↓\infty \subseteq \not\sim$.  Conversely, if $q \not\sim q'$,
there is some $x$ for which $q \tau ↓x \omega  \not= q' \tau ↓x \omega$.
Letting $i=|x|$, $q \nu ↓i q'$, and $q \nu ↓\infty q'$,

\smallskip
\noindent{\bf Drill:}  Prove it by induction.

\noindent Therefore $\nu ↓\infty = \not\sim$, and $\overline \nu ↓\infty$ is the
equivalence relation on $Q$.

To test two DFRs with distinct control states for equivalence, combine them
into one program.  They accept the same language if their start states are 
equivalent in the combined program.

Next we show that any standardized program equivalent to $P$ has at least as
many control states as there are equivalence classes of states in $P$.  If
$q \not\sim q'$,
\vskip 1.5in
Let $P'$ be a standardized program equivalent to $P$, and $q↓x$, $q↓{x'}$ be
the states reached by $Pi$ after reading $x$ and $x'$ respectively.  If
$q ↓x = q ↓{x'}$, $P'$ would classify $xy$ and $x'y$ equally, unlike $P$.
Therefore, corresponding to a set of inequivalent states $\{q, q', q'', 
q''',...\}$ of $P$, one from each equivalence class, we find a set of
distinct states $\{q↓x, q↓{x'}, q↓{x''},...\}$ of $P'$.  No program equivalent
to $P$ can have fewer control states than this.

Next we construct a minimized program $P↓{[\,]}$ equivalent to $P$.  Let $Q↓{[\,]}$
be the set of equivalence classes $[q]$ of the control states of $P$.  Define
$\omega ↓{[\,]}$ by
$$[q] \omega ↓{[\,]} = q \omega.$$
Let the instructions be
$$\langle [q] → [q \tau ↓a], \hbox{scan}\,a \rangle.$$
The representation function $\rho$ defined by $\langle[q], x\rangle\rho \langle q, 
x \rangle$ provides a ready proof that $P↓{[\,]}$ simulates $P$.  We have already shown that the size
of its control state set is minimal.

To prove $P↓{[\,]}$ deterministic, we show that when in control state $[q]$ and
scanning $a$, there is a unique next control state.

If the instructions
$$\displaylines{\langle[q] → [q \tau ↓a], \hbox{scan}\,a\rangle \hbox{ and }\cr
\langle [q'] → [q' \tau ↓a], \hbox {scan}\,a\rangle\cr}$$
go to distinct equivalence classes $[q \tau ↓a]$ and $[q' \tau ↓a]$, then for
some $x$ $q \tau ↓a \tau ↓x \omega \not= q' \tau ↓a \tau ↓x \omega$, $q \not\sim
q'$ and $[q] \not= [q']$.
\smallskip
{\bf Drill:}  Show $P ↓{[\,]}$ simulates $P$.
\smallskip
{\bf Exercise:}  Show that any two equivalent minimized DFRs are the same except
for the naming of the control states.

Given a DFR, $P$, for language $L$, we may want to test whether $x \Lsim x'$.  One
way is to have the minimized program $P ↓{[\,]}$ read both $x$ and $x'$, which are
prefix equivalent iff the same control state is reached.

Another is to compute the state equivalence relation for $P$, then have $P$
read both $x$ and $x'$, which are prefix equivalent iff equivalent control 
states are reached.

A third way has $P$ read both $x$ and $x'$, arriving at control states $q$,
$q'$ respectively.  Define relation $\rho ↓i$ by
$$\eqalign{\rho ↓0 &= \{\langle q, q'\rangle \},\cr
\rho ↓{i+1} &= \rho ↓i ∪ \bigcup ↓{a \in \Sigma} \tau ↓a↑{-1} \circ \rho ↓i
\circ \tau ↓a.\cr}$$

If any pair $\langle q↓1$, $q↓1'\rangle$ for which $q↓1 \omega \not= q↓1'
\omega$ belongs to $\rho ↓i$, then
$x \not\sim x'$, and in fact there is a string $y$ of length $\le
i$ for which arguments $xy$ and $x'y$ give different results.  Otherwise
eventually $\rho ↓{i+1} = \rho ↓i$, and $x \sim x'$.

To test for infix equivalence (again, in the language $L$ of program $P$) one
way is to compute $\tau ↓x$ and $\tau ↓y$ in the minimized program $P↓{[\,]}$,
and test these relations for equality.  Another way begins by determining the
set $S$ of reachable control states.  Define the relation 
$\rho ↓0 = \tau ↓x↑{-1} \circ I↓s \circ \tau ↓{x'}$, which
holds between $q$ and $q'$ iff for some $u$, reading $ux$ and $ux'$ 
respectively takes $P$ to $q$ and $q'$ respectively.  Proceed as in the
third method above.
\bigskip
\parindent0pt
\copyright 1988 Robert W. Floyd

First draft April 25, 1988.

\bye